What do we mean by small worlds?
Simply this:
- we are working with models
- models are simplification of reality
- they are incomplete
- they are myopic
Probability as counting
- For each possible explanation of the data (i.e., possible value of \(p\))
- Count all of the ways the data could happen (i.e., plausibility of data | \(p_{proposed}\))
- Explanations with more ways to produce them are more plausible
Probability as counting
Most formulas for probabilities are just short-cuts for counting (or integrating). E.g., the binomial distribution:
\[\begin{align}
\Pr(k \text{ of } n | p) &= \binom{n}{k}p^k(1-p)^{n-k} \\
&= \frac{n!}{k!(n-k)!} p^k(1-p)^{n-k}
\end{align}\]
Probability as counting
Most formulas for probabilities are just short-cuts for counting (or integrating). E.g., the binomial distribution:
\[\begin{align}
\Pr(k \text{ of } n | p) &= \binom{n}{k}p^k(1-p)^{n-k} \\
&= \frac{n!}{k!(n-k)!} p^k(1-p)^{n-k}
\end{align}\]
Note: This formula provides us a short-cut for calculating the probability of observing \(k\) of \(n\) “successes” given a particular value of the parameter \(p\).
Don’t fret! Just think of constraints!
We choose statistical distributions because of
Theory (someone else figured it out for you!)
Constraints (stuff you know)
- Discrete vs. continuous?
- Bounded? Positive?
- A/symmetric?
We will learn along the way
Bayes theorem
\[\begin{align}
\Pr(p_i | k) &= \frac{\text{Probability of data} | p_i\times \text{Prior probability of }p_i}{\text{Probability of the data overall}}
\end{align}\]
But let’s not get caught up in the math! (Watch the 3B1B video… it’s better!)
Bayes theorem
A Bayesian modeling perspective
- Generative models are Bayesian models
- start from there… takes longer, but more powerful
- Workflow is important
- Variables: data are observed variables, parameters are unobserved variables
- Distributions: likelihoods and priors are both distributions
- Indexing: there is no random or fixed effects, just indexed variables
Parts of the model
- Describe the distribution of the data | parameters (AKA likelihood)
- \(W \sim \text{binomial}(n, p)\) in globe-flipping example
- \(W\) is number times thumb was on water
- \(n\) is number of flips
- \(p\) is the parameter we are trying to estimate
- can also involve description of how parameters relate
- E.g., \(\text{logit}(p_i) = \alpha + \beta \times x_i\)
- Description of the distribution of parameters prior to observing the data
- \(p \sim \text{beta}(1, 2)\)
- What is possible (and more/less probable) prior to observing data?
- Some engine to do the calculations
- integration (nope) & conjugate priors (sometimes you get lucky!)
- grid approximation
- quadratic approximation (
quap())
- MCMC, WinBugs, JAGS, Stan (
ulam())